Classification of Natural Language Sentences using Neural Networks
نویسندگان
چکیده
In this work the task of classifying natural language sentences using recurrent neural networks is considered. The goal is the classification of the sentences as grammatical or ungrammatical. An acceptable classification percentage was achieved, using encoded natural language sentences as examples to train a recurrent neural network. This encoding is based on the linguistic theory of Government and Binding. The behaviour of the recurrent neural network as a dynamical system is analyzed to extract finite automata that represent in some way the grammar of the language. A classifier system was developed to reach these goals, using the Backpropagation Through Time algorithm to train the neural net. The clustering algorithm Growing Neural Gas was used in the extraction of automata. Introduction Neural networks have been widely used in classification tasks and, in particular, multilayer feedforward networks are well-known and broadly implemented. However, in the case of grammars, it may be verified that recurrent neural networks have an inherent ability to simulate finite state automata (Haykin 1999), from which grammars of regular languages are inferred. The behaviour of a recurrent neural net (RNN) as a dynamical system can be analyzed (Haykin 1999) to construct a finite state automaton (Giles et al. 1992; Omlin & Giles 1996; Lawrence, Giles, & Fong 2000). However, regarding the natural language processing, it must be noted that grammars of natural languages cannot be completely represented by finite state models, due to their hierarchical structures (Pereira & Shabes 1992). Nevertheless, it has been shown that recurrent networks have the representational power required for hierarchical solutions (Elman 1991), and therefore, a type of RNN for natural language processing, called Elman network, is studied in this work. This article describes the use of a RNN to classify natural language encoded sentences by their grammatical status (grammatical or ungrammatical), using the encoding assumed by the Government and Binding theory of syntax (GB-theory) (Chomsky 1981). The objective of the RNN Copyright c © 2003, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. is to produce the same judgements as native speakers on the grammatical/ungrammatical pairs, and to infer some representation of the grammar, expecting that the neural network learns in some way grammatical features. In recent years, diverse architectures of RNNs have been developed and used to extract automata for grammatical inference (Giles et al. 1992). In this work, based on the research of Lawrence, Giles, & Fong, the capacity of the Elman recurrent neural network (Elman 1991) to classify correctly natural language sentences is shown. The Elman network has been used previously in some natural language processing tasks (Stolcke 1990) and is used here, because of the good results found in its training (Lawrence, Giles, & Fong 2000). An implementation of the Backpropagation through time (BPTT) algorithm, specifically adapted to the problem, was developed and used to train the network, achieving an improved optimization of the algorithm convergence with respect to previous works (Lawrence, Giles, & Fong 2000; Roa & Nino 2001). Finally, the behaviour of the net as a dynamical system was analyzed to extract the knowledge acquired by the Elman neural net, in this case finite automata that represent some of the syntactic structures found in the language. The clustering algorithm called Growing Neural Gas network (Fritzke 1997) was used to reach these goals, obtaining improved results compared to anterior works. The rest of this article is organized as follows. First, the design of the classifier system is described, i.e., the Elman network, its topology and the training algorithm used. Then, the results of this classification are presented. Finally, the automata extraction process is explained in detail. Design of the classifier system The input data are encoded sentences of the English language. These examples were taken from the investigation described on the article of Lawrence and others (2000). The neural network was trained using the same examples, expecting the same judgements as native speakers on the grammatical/ungrammatical pairs. Consequently, without having any knowledge about the components assumed by the linguistic theory described above, in this case positive and negative examples are used, trying to exhibit the same kind of discriminatory power that the linguists have found using the GB-theory (Lawrence, Giles, & Fong 2000). The GB-theory assumes four primary lexical categories, which are verbs (v), nouns (n), adjectives (a) and prepositions (p). The other four classes are complementizer (c), determiner (det), adverb (adv) and marker (mrkr). Besides this categorization, a subcategorization of the classes is also defined, i.e., the categories are subcategorized depending on the context. Following the GB-theory and using a specific parser, the subcategorization was made by Lawrence, Giles, & Fong to obtain the encoded examples used in this work. For example, an intransitive verb, such as sleep, would be placed into a different class from the transitive verb hit. Similarly, verbs that take sentential complements or double objects, such as seem, give or persuade, would be representative of other classes. Hence, the encoding resulted in 8 classes, categorized as follows: 9 classes of verbs, 4 of nouns, 4 of adjectives, 2 of prepositions. The remaining classes do not have subcategorization. Table 1 shows some examples of sentences, their respective encoding and grammatical status, where 1 means grammatically correct and 0 incorrect. Sentence Encoding Grammatical
منابع مشابه
Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents
The goal of sentence and document modeling is to accurately represent the meaning of sentences and documents for various Natural Language Processing tasks. In this work, we present Dependency Sensitive Convolutional Neural Networks (DSCNN) as a generalpurpose classification system for both sentences and documents. DSCNN hierarchically builds textual representations by processing pretrained word...
متن کاملResearch on attention memory networks as a model for learning natural language inference
Natural Language Inference (NLI) is a fundamentally important task in natural language processing that has many applications. It is concerned with classifying the logical relation between two sentences. In this paper, we propose attention memory networks (AMNs) to recognize entailment and contradiction between two sentences. In our model, an attention memory neural network (AMNN) has a variable...
متن کاملParsing Natural Scenes and Natural Language with Recursive Neural Networks
Recursive structure is commonly found in the inputs of different modalities such as natural scene images or natural language sentences. Discovering this recursive structure helps us to not only identify the units that an image or sentence contains but also how they interact to form a whole. We introduce a max-margin structure prediction architecture based on recursive neural networks that can s...
متن کاملA Natural Language Translation Neural Network
proper translation by a user without any expert knowledge of how the computer stores and represents rules. This paper demonstrates the utility of neural networks in precisely this area on a small scale translation problem. We have tested the ability of neural networks to perform natural language translation. Our results have shown a greatly improved translation accuracy in comparison to the wor...
متن کاملRationale-Augmented Convolutional Neural Networks for Text Classification
We present a new Convolutional Neural Network (CNN) model for text classification that jointly exploits labels on documents and their constituent sentences. Specifically, we consider scenarios in which annotators explicitly mark sentences (or snippets) that support their overall document categorization, i.e., they provide rationales. Our model exploits such supervision via a hierarchical approa...
متن کامل